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Abstract 

Most centralities proposed for identifying influential spreaders on social networks to 
either spread a message or to stop an epidemic require the full topological information 
of the network on which spreading occurs. In practice, however, collecting all 
connections between agents in social networks can be hardly achieved. As a result, such 
metrics could be difficult to apply to real social networks. Consequently, a new 
approach for identifying influential people without the explicit network information is 
demanded in order to provide an efficient immunization or spreading strategy, in a 
practical sense. In this study, we seek a possible way for finding influential spreaders by 
using the social mechanisms of how social connections are formed in real networks. We 
find that a reliable immunization scheme can be achieved by asking people how they 
interact with each other. From these surveys we find that the probabilistic tendency to 
connect to a hub has the strongest predictive power for influential spreaders among 
tested social mechanisms. Our observation also suggests that people who connect 
different communities is more likely to be an influential spreader when a network has a 
strong modular structure. Our finding implies that not only the effect of network 
location but also the behavior of individuals is important to design optimal 
immunization or spreading schemes. 


Author Summary 
Introduction 


Identifying influential spreaders on social networks is crucial for its practical application 
in real-world epidemic and information spreading For instance, superspreaders 

need to be immunized with the highest priority in order to prevent the pandemic of an 

They are also important for spreading of information in viral 


infectious disease 6 

-10 

marketing |4[|5 

11 

.2 . " 


To this end, several predictors for influential spreaders based on 
the topological property of complex networks, including high degree [6 13 
k-coTe [8 14 15 , betweenness centrality [Tb], PageRank 17 , and many others 18 


tested for identifying influential spreaders 18 -10 


were 


Most studies, however, have overlooked how to apply to the real-world social systems 
which is a serious problem in a practical sense. Most proposed centralities except the 
degree, which is a local centrality, require the information of the whole network 
structure. But collecting this information is nearly impracticable in real social systems. 
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Specifically, gathering information of relationships among individuals is inevitably 
incomplete and erroneous 


19 , since it cannot but be conducted for a partial sample of 


a whole population. Thus, searching for the influential spreaders with these centralities 
may not be plausible for real-world spreading phenomena. On the other hand, if whole 
connections in a network are accessible, direct measuring for the influence of a single 
node is possible by using model simulation on the network, which obviates the need for 
predicting influential spreaders. Consequently, in reality most predictors proposed for 
an influential spreader are either inapplicable or unnecessary. 

Thus more realistic approaches based on the characteristics of people such as their 
behaviors are demanded for predicting influential people without the explicit 
information of network structure. The benefit of this method is an easy applicability for 
any kinds of social networks since one can obtain the probabilistic actions of agents by 
using a survey conducted from a population. Through a survey, we can estimate the 
probabilistic tendency of how connections are established for each individual, for 
instance, how probable is to make a new friend by introduction from another friend or 
the frequency to make new friends from different groups. We find that these human 
actions have a large influence on the subsequent spreading of information and therefore 
can be a reliable predictor of the node’s importance in a future epidemic or in a viral 
marketing campaign via targeting people identified by their probabilistic actions. In 
addition, such ranking obtained from surveys can also apply to the situations when the 
information for only some people is accessible. 

The social mechanisms of link formation driving evolution of networks have been 
studied for a long time in order to explain and predict complex phenomena in society. A 
number of social mechanisms for connection establishments have been proposed in 
sociology 20 21 . Thanks to the detailed records in online social networks that captures 


the action of every individual, it is now possible to quantify the frequency of occurrence 
of different types of mechanisms by directly observing social interactions 


22 . Thus, 


recently, the frequencies of the social mechanisms for each person in a social network 


have been revealed from the full log of the activity in online social networks 22 


In this paper, we propose an approach to identify influential spreaders based on 
surveys on human behavior and social mechanisms that can be given to a population 
without the explicit information of networks. We decode the relation between people’s 
characteristics that can be obtained by a survey and their influence in spreading using 
the real-world datasets that contain the full information of network evolution. Through 
the analysis of large-scale evolving networks, we identify the effect of the microscopic 
link formation on macroscopic consequences in spreading. We find that the interaction 
to connecting a hub can facilitate epidemic spreading and thus can be a reliable 
predictor of people’s importance in future epidemics or viral marketing campaigns. We 
also find that people with high frequency to connect different communities are more 
likely to be an influential spreader for the case when a network is composed of strongly 
connected modules. This research represents much to practical implication, since our 
finding can be adopted in reality requiring only the tendency of individuals’ behaviors. 
Furthermore, our results provide a guideline for behavior to the public, about how to 
behave at the beginning stage of epidemic. 


Materials and Methods 

Social mechanism 

In this paper, the social mechanisms are referred to as the probabilistic tendency of each 
kind of interaction among people in a given social network. The social mechanisms do 
not directly mean the motivation behind the link creation because several different 
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mechanisms may result in the same type of link formation and link formation may not 
be motivated by only the structure 


22 . In addition, these mechanisms are not 


complementary one another, because a link can be established by multiple different 
mechanisms. For instance, a newly created link can appear following balance and 
exchange interactions at the same time. 

We use four classes of social mechanisms underlying the link creation on a network 


based on the multitheoretical multilevel formalism 20 proposed in sociology: (1) 
Exchange interaction corresponds to a newly form reciprocal link meaning that a new 
link is established in the opposite direction of an existing link. (2) Balance interaction 
corresponds to a newly form tie that closes a triangle by a directed edge. (3) Collective 
action (or preferential attachment [23] ) corresponds to a link that connects with 
well-connected people. To be specific, in this study, we measure the extent of the 
collective action of each link as a continuous value using the cumulative probability 
F{ki) of the excess degree distribution for a newly connected neighbor i. Here, 

= J2kj<k where q{k) is the degree distribution of a network and (fc) 

represents the average degree of a network. (4) Structural hole interaction considers a 
newly created link that connects two different modules (communities). Community 
structure is identified by the local version of link community detection method when 
a new link is established [see detailed in Text S2]. 

These social mechanisms are assigned on an evolving network at the moment when 
the link is newly added following the analysis developed in [^. While constructing the 
evolving network by adding the new connection in sequential order, we characterize each 
connection to the corresponding social mechanisms based on a network configuration at 
the given moment. After all links are formed, the frequencies of social mechanisms of 
the origin node, i, a“, and af', where i is node index, are defined as the 

number of neighbors that were connected by the corresponding mechanism, respectively, 
exchange, balance, collective action, and structural hole (the sum of the extent for the 
collective action of all connected nodes) normalized by the total number of neighbors. 
To be specific, the frequency a“ of social mechanism a for node i is defined as 
af = , where nf is the number of links formed corresponding to a social mechanism 

and is the outdegree (the total number of new connections). Therefore, each 
variable ranges from zero to unity, and as increases, the corresponding social 
interaction is more frequent. 

We stress here that the extent of social mechanisms of link creation for each 
individual can be estimated in a real setting by the surveys given to the population. For 
instance, one first could ask people to list their contacts and then as a second stage ask 
questions about each contact [^. For example, we could ask questions like, (exchange) 
did the person contact you first?, (balance) did the person have common friends with 
you when you contacted him/her?, (collective action) did the person have a lot of 
contacts when you contacted him/her?, (structural hole) did person belong to another 
group than you when you contacted him/her? Therefore, an estimate of for each 
individual can be obtained from the surveys conducted for the population. On the 
contrary, most centralities including fc-shell index [^, betweenness centrality 16 , and 
PageRank 17 cannot be obtained by this way since they require global network 
information. 


Data sets 


We examine two social networks of Internet dating services in Sweden 22 26 and the 


forum of internet-mediated prostitution in Brazil 27 . These social networks represent 


potential pathways for epidemic spreading including sexually transmitted diseases. We 
use the data of the largest site qx.se for Nordic homosexual, bisexual, and transgender 
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Table 1. Properties of real-world networks used in this study. 


Network 

Name 

Number of nodes 

{k) 

Modularity 28| 

QX.com favorite 

QXF 

80,407 

13.07 

0.4060 

QX.com guestbook 

QXG 

59,854 

7.10 

0.3893 

POK.com 

POK 

29,242 

5.95 

0.3992 

Livejournal.com 

LJ 

315,936 

3.56 

0.6578 

Prostitution 

PRO 

16,729 

4.67 

0.6294 


(k) is the average degree of the network. We use the fast-greedy community detection 
algorithm 28 for measuring modularity. 


people in 2006 (QX). Actions of every individual in the community, including adding an 
individual to the favorite list and guestbook signing, were recorded for two months 
starting from Nov. 2005. We use adding favorite lists (QXF) and signing guestbook lists 
(QXG) among many activities. We also analyze pussokram pussokram.com dataset 
(POK) [26| , which was a Swedish online dating site for friendship including flirting and 
non-romantic relations. The data contains a full log for 512 days starting from the day 
when the community was created in 2011. The POK network that we use in this study 
consists of message senders, receiver, and the timing of interactions in the community. 
Internet-mediated prostitution data (PRO) [27| comes from Brazilian online forum 
where sex-buyers evaluate prostitutes. We construct the PRO network by connecting 
sex-sellers with buyers. Since the PRO network is an undirected and bipartite graph, 
the exchange and balance interactions are not defined. In order to investigate the 
problem of identifying influential spreaders of information, we study the citation 
network in the posts of an online network service, livejournal.com (LJ), for information 
spreading on social networks 10 . One should note that the QX has already a large part 


of network (85 and 87 % for the QXF and QXG, respectively) whereas the others starts 
at time t = 0. Table 1 gives the basic information of the datasets. 

We can reconstruct the evolving connection of networks, following the precise timing 
when a tie has been established, in contrast to the observation of static snapshots of 
networks. In our datasets, we can observe every evolution of social networks with the 
time stamp of link creations. We stress here that the precise information of temporal 
evolution is essential to identify the social mechanisms for each link. The social 
mechanisms should be defined at the moment when a new link established 


22 


Accumulated static networks do not keep the order of time that links established and 
thus are misleading about the social interactions. In this regard, our datasets containing 
the full log of network evolution allow us to define social mechanisms properly. 


Influential spreader 

In order to assess the influence of people for epidemic spreading, we use the epidemic 
size Mi originating from a seed i in the susceptible-infected-recovered (SIR) model on 
the finally accumulated network [^. The SIR model has been used to describe infectious 


disease for a long time 29 . At the same time, the SIR model is a plausible model of 


information spreading [^. In the SIR model, each node can be in one of three states, 
susceptible, infected, or recovered (or removed). Initially, all nodes are in the 
susceptible state except for a single node in the infected state. At each time, the 
infected node spread a disease/information to a susceptible neighbor with infection 
probability /3. At the steady state, we measure Mi as the fraction of finally infected 
nodes. We define a node with high Mi as highly influential. 

We choose the infection probability /? to be a value covering a small part of a 
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Figure 1. The difference lS.E?{x) of the coefficient of determination when a 
variable x is excluded in regression analysis of (a) Eq. (1) and (b) Eq. (2) 
in QXF network, (a) fc-shell index shows the largest drop of B?, showing the 
strongest predictive power for influential spreaders. However, fc-shell index is difficult to 
obtain since it requires the full topological information of the network. Although the 
degree and social mechanisms a“ show smaller predictive power than fc-shell index, they 
can be easily obtained from surveys and have much implication in a real setting, (b) 
The degree shows the largest difference among the degree and social mechanisms that 
can be obtained from surveys. Next, among the social mechanisms, collective action 
shows the largest drop of . Thus, collective action is a more reliable predictor than 
the others from the human behavioral point of view. 


network, (3 > /3c where /3c is the epidemic threshold for percolation [3 29 . When 
j3 ^ Pc, all seed produces similar epidemic size because spreading can cover almost all 
network regardless of where it originated from 


31 


Results 


Predictor for influential spreaders based on human activity. 

We recreate the entire network by adding all links in the order of time that they were 
established. In order to assess systematically the relation of the epidemic influence 
with the social mechanisms as well as topological metrics, we use multilinear regression 
analysis 32 with the following model (Tables S1-S5): 


M, = Co + cia^ + C 2 a^"' + 030 “ + C 4 af + + cgfcf + 07 ^^"“ + + e. ( 1 ) 

Here, ki is the degree of node q kf' is the fc-shell index (Text SI), fcf'^™ is the sum of 
degree of the nearest neighbors fcf““ = J2jev(i) ^3 where V{i) is the set of node z’s 
neighbors [^, kf^ 


= E, 


3^V-i{i) 


is the sum of degrees of the next-nearest neighbors, 

Zj where V 2 (*) is the set of neighbors of node i’s neighbors 


10 


and 


is the error term. We introduce the topological metrics, since we are interestea in how 
much information we captured using the social mechanisms tendencies 
{af^^, , af'} in comparison with the more common topological measurements, 

{ki, kf', fcf™, fcf®'^™}. In order to avoid biased observation due to the large fluctuation 
in the small degree region, we exclude the data of people with degree less than three 
from our analysis. 

The fc-shell index and its local proxy fc®'^™ and fc2®“™ have been regarded as an 
efficient topological predictor for influential spreaders [8,10. In agreement with these 
previous studies, we find that fc®** can capture most of the fluctuation in the epidemic 
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size for the datasets. To quantify the effect of each variable, we measure the difference 
AR'^{x) of the coefficient of determination when a variable x is excluded. In Fig. 1, the 
difference of the coefficient of determination is the largest when A:*'® is 

excluded from Eq. (1), which confirms the importance of A:*'®. In addition, more than 
82.3 % of the fluctuations can be explained by solely the A:-shell index for the QXF 
network (Table SI). For the QXG, POK, LJ, POK networks, we also find the similar 
trend as the QXF (Tables S2-S5). However, being a global quantity, the A:-shell index 
can be difficult to obtain as discussed above. Therefore, has the limitation to apply 
for real social systems despite its strong correlation with the spreading influence, k^'^™ 
or also captures a huge part of the variance in the data. While these are a local 

measurement, they still can be difficult to obtain because they require the exact number 
of friends of friends at the time when epidemic occurs 10 . 

The degree k is not behavioral but in contrary to A:®", and the degree k 

can be estimated by a survey to individuals by a simple question: how many friends do 
you have? Therefore, even if we cannot conceive the structure of network, for many 
cases, we can access the information of the degree together with the other social 
mechanisms, af. Next we are interested in the case where the topological location such 
as A:-shell cannot be obtained for the reasons explained above. Therefore, we regress the 
data of Mi with the variables which can be easily obtained by surveys using the 
following model, where A:®^, k^'^’^, and are excluded: 


Ml = Co + CiOj™ + 020°'"'' 


c^aT ■ 


CA,af" + coh + £■ 


( 2 ) 


When we consider Eq. (2), we can explain 63 % of the variance for the QXF network 
(Table S6), demonstrating that with only surveys we can capture extremely high 
amount of the variance. The all variables in Eq. (2) can be easily obtained from surveys, 
suggesting that we can rely on surveys for optimally immunization or viral marketing. 

Using Eq. (2), we find that the degree is the most reliable predictor for the 
influential spreaders among the degree and af. When the degree is excluded from 
Eq. (2), the coefficient of determination R? drops 0.49 from 0.62, showing the largest 
difference (Fig. lb). Since the degree represents the number of the transmission 
channels for a seed, the degree can play an important role in epidemic spreading on 
networks especially at the beginning stage of outbreak [8 33. When compared with the 
topological location of the people given by A:-shell, we find that the degree alone can 
explain 58 % of the variance, which compared to the value of A:-shell {R^ = 0.82), 
indicating that the degree is a worse predictor than A:-shell in agreement with [^ . In a 
real setting, however, the local degree can have more implication than A:-shell because it 
can be easily obtained from surveys. 

Next, we are interested in what social mechanisms af are more important for 
spreading besides the local degree. This is not only important for optimal immunization 
and information spreading but also for education of the population to avoid certain 
behaviors that could spread diseases to huge population. In order to examine the effect 
of the social mechanisms clearly, we study the deviation of the epidemic size AM^ from 
the average epidemic size for people with the same degree by following 


AM, = Ml - 


12 j ^ki,kjMj 
12j ^ki,kj 


(3) 


where 6 ij represents the Kronecker delta such that the function is 1 if the variables are 
equal and 0 otherwise. AM^ quantifies the impact of the social mechanisms after 
removing the effect induced by the degree, thus, more clearly identify the important 
social mechanism for spreading for people with the same degree. 

To compare the influence of each social mechanisms in the spreading process, we 
study the average size AM infected in an epidemic originating at people i with a given 
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Figure 2. Collective action predicts influential spreaders more reliably than 
other social mechanisms. When spreading originates in people with the 

relative epidemic size M(a“,a“) for the QXF with (a) (b) and (c) a®*', (d-f) 

QXG, (g-i) POK, and (j-1) LJ networks. Collective action a®®' predicts the epidemic 
influence more reliably than the other social interactions when we compare for people 
with the same degree. 
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Figure 3. The effect of weak ties on spreading for different networks with 
diverse modularity. The panel shows the slope of the frequency of structural hole 
with respect to epidemic influence Mi in regression analysis as a function of modularity 
of a underlying network. For networks with highly modular structure such as LJ and 
PRO, the frequency of structural hole is positively correlated with Mi. 
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{af^^, , Qi’^, af^). The average infected population over all the origins with the same 

pair of (a“, a^) is 

AM= E ^7^, (4) 


jeW(a“,a'3) 


iV(a“,ad)’ 


where lT(a“,a^) is the union of all nodes with (a“,a^) and A^(a“,a^) is the number of 
nodes with (a“,a^). In Fig. 2, we find that AM increases with increasing a“ regardless 
with the other social mechanisms for all tested networks. This clear pattern suggests 
that a'^®' predicts the epidemic influence more reliably than the other social interactions 
when we compare for people with the same degree. 

The regression analysis of Eq. (2) also supports the importance of the collective 
action. When we remove from Eq. (2), the difference Ai?^ of the coefficient of 
determination is the largest, which confirms the importance of collective action. Since 
people with high collective action are more likely to have many next nearest neighbors, 
they have high chance to develop larger epidemic outbreaks. On the contrary, people 
with less collective action, is likely to be located at the periphery of a network leading 
to a small impact in the spreading. Thus, the collective action is a reliable predictor 
from the human behavioral point of view when we factor out the popularity. 


Strength of weak ties and community structure. 


So far, we search the most influential spreaders based on social mechanisms, a“ which 
can be obtained by surveys. In sociology, a long-standing hypothesis for influential 
spreaders is the strength of weak ties 34 . According to the hypothesis, weak ties which 


bridge between two densely connected modules formed by strong ties play an important 
role especially in the job changing in labor market 34 35 , mobile communication 
networks 36 , as well as brain 37 . While this hypothesis may seem counter-intuitive. 


for a perspective of information spreading, the weak ties is more likely to be a source of 
fresh information, so weak ties can have a stronger effect than strong ties. 
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In this section, we test the weak tie hypothesis by observing the evolution of link 
formation in a large scale real-world network. We define weak connection as a link 
bridging two different communities at the time when a new link is formed, called 
structural hole. If weak ties play an important role in spreading processes as the 
hypothesis of weak ties, people with high probability of structural hole interactions is 
more likely to have influence in spreading. In order to test the effect of weak ties 
(structural hole), we regress the data of Mi with the variables of social mechanisms a“ 
using the following model, where the network properties and k are 

excluded: 


— Cq -\- C4a^^ -|- e. 


(5) 


The degree k is also excluded in order to focus on the effect of behavioral factors on 
spreading. 

From the regression analysis, we confirm that people with high frequency of 
structural hole interaction is more likely to be an influential spreaders on LJ and PRO 
networks as the weak tie hypothesis. In LJ and PRO networks, the frequency of 
structural hole is positively related with the spreading influence Mi with extremely 
small p-value (Fig. 3 and Tables S9 and SIO). However, this pattern does not hold for 
all social networks that we tested. For QXF, QXG, and POK networks, af^ is 
negatively correlated with Mi in contrary to the weak tie hypothesis (Fig. 3 and 
Tables S6-S8). This result suggests that the weak tie hypothesis may not be generically 
valid for all social networks. 

The validity of the weak tie hypothesis can rely on the underlying network where 
spreading occurs. People with high frequency of structural hole interactions potentially 
spreads different communities all together. Therefore, if an underlying network of 
spreading has clear module structure, the effect of weak ties is significant 38 . However, 


39 


when community structure is less clear the role of weak ties in spreading can be 
weakened. In order to check this prediction, we compare the modularity of networks 
and the effect of weak ties (Fig. 3). When a network has strong community structure 
such as LJ and PRO whose modularity is 0.658 and 0.629, respectively, the frequency of 
structural hole is positively correlated with Mi. Therefore, the structural hole 
mechanisms can enhance the epidemic influence for networks with strong modular 
structure as the weak tie hypothesis. However, the weak tie hypothesis is not valid for 
networks with less clear module structure. For instance, the QXF, QXG, and POK 
networks showing less modularity around 0.4, af' play a minor role in spreading and 
negatively correlated with Mi (Fig. 3 and Tables S6-S8). If the modular structure is not 
significant, the weak ties are not clearly defined, leading to decrease of the effect of 
weak ties. Thus, the weak tie hypothesis is expected to be valid for strong module 
structure not universally for all social networks. In conclusion, people who connect 
different communities can be suspected as an influential people when an underlying 
network is composed of strong modular structure. 


Discussion 


So far, most studies of spreading on complex networks have assumed that a network 
structure is known. This means that full information on any people on who is connected 
with whom is required, which may not be obtained in real settings. In agreement with 
the previous studies, we find that when the information of global structure of social 
networks is available, it is beneficial for identifying influential spreaders in an epidemic 
model capturing up to 90 % of the variance with simple variables with the A:-shell [8 10 
In reality, however, it is difficult to gather the complete sets of interactions among 
people. Therefore, all the previous method for the influential spreaders based on the 
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network topology could be impractical. Searching for influential spreaders without the 
information of a network is essential in order to prevent the global pandemic and 
minimize the cost for immunization. 

Thus, we proposed a possible strategy for identifying the influential spreaders by 
using characteristics of people’s behavior underlying the evolution of social networks. 
Our finding provides several pragmatic lessons for the efficient immunization strategy as 
well efficient information spreading campaigns. First, in the absence of A:-shell, the 
degree is the first local quantity that can be used to predict the influential spreaders. 
From the behavioral variables quantifying the social mechanisms a“, collective action 
gives a complementary information to the degree, so it is suitable for a strong indicator 
for influential spreaders when comparing among people with the same degree. Also, a 
person with a high tendency to connect two different groups via weak ties can also be 
suspected as a influential spreader when the network has a strong modular structure. 
Our analysis provide not only an applicable identifying scheme of influential spreader 
based on surveys but also a guideline for activity to the public, about how to behave 
when epidemic occurs. For instance, during the beginning stage of epidemic, one need 
to avoid meeting popular people or people belonging to a different group that could 
spread diseases to huge population. 


Supporting Information 

51 Table 

Multilinear regression for the QXF networks with Eq. (1) 

M, = Co + cia“= + C 2 a,^“' + 030 “ + C 4 af'’ + c^h + cekf + cyfcf + cgA:!"™ + e. 

52 Table 

Multilinear regression for the QXG networks with Eq. (1). 

M, = Co + cia“= + C 2 a,^“' + Cso^ + C 4 af'’ + Cgfc, + c^kf + Cyfcf+ cgA:!"™ + e. 

53 Table 

Multilinear regression for the POK networks with Eq. (1). 

M, = Co + cia“= + C2aJ“' + cso™ + C4af'‘ + c^h + c^kf + 07*^“™ + cg/cf"™ + e. 

54 Table 

Multilinear regression for the LJ networks with Eq. (1). 

M, = Co + ciar= + C 2 aJ“' + cgo^ + C 4 af + c^h + c^kf + crkf^^^ + cgA^f"™ + e. 

55 Table 

Multilinear regression for the PRO networks with Eq. (1). 

M, = Co + cia“= + C 2 a,^“' + Cgo^ + C 4 af'’ + c^k, + cekf + Cyfcf+ cgA:|“™ + e. 

56 Table 

Multilinear regression for the QXF networks with Eqs. (2) and (5). 

Mi = co + cia“° + C 2 aJ“' + cga™ + c^af^ + + e. 
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57 Table 

Multilinear regression for the QXG networks with Eqs. (2) and (5). 

Mi = Co + Cia“° + C 2 aj“' + 030 “ + C 4 af^ + c^ki + e. 

58 Table 

Multilinear regression for the POK networks with Eqs. (2) and (5). 

Mi = Co + cia“° + C 2 aJ“' + csa™ + C 4 af'‘ + c^ki + e. 

59 Table 

Multilinear regression for the LJ networks with Eqs. (2) and (5). 

Mi = Co + Cia“° + C 2 a\°‘’‘ + + c^al^ + c^h + e. 

SIO Table 

Multilinear regression for the PRO networks with Eqs. (2) and (5). 

Mi = Co + Cia“° + C 2 aj“' + Cso^ + c^af- + c^ki + e. 


51 Text 

A:-shell index. 

52 Text 

Identifying structural hole in the link community. 
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Supporting information 


Identifying strnctnral hole 

In order to identify the intercommunity links, called the structural hole, we use the link 
community detection method proposed in |^. We adapt the method for local version 
only using local information of networks because global information is difficult to obtain 
by surveys. The method for identifying structural hole link is following. When a new 
link Cik is added, likewise the original link community algorithm [^, we define the 
similarity S{eik,ejk) between two links Cik and each of existed links Cjk by following. 


Gjk') 


\n+{i)r\n+{j)\ 

|n+(i)Un+(j)|’ 


( 6 ) 


where n^(i) is the set of neighbors of node i. Therefore, if there are many common 
friends the similarity is high. Then, if the similarity is less than a certain threshold 
meaning that two neighbors have only few fraction of common friends, we judge the 
newly added link as a structural hole. 


Epidemic size for a seed node with degree k on randomized 
networks 

The susceptible-infected-recovered (SIR) model on a network can be mapped into bond 
percolation problem with the probability of link occupation /3. In the perspective of 
bond percolation, the epidemic size initiated by a single seed node is the statistically 
same as the average size of component including the seed node. Then, one can obtain 
the epidemic size initiated by a seed node having degree k by following a generating 
function method [^[^. Given the degree distribution q{k) of a network, we define the 
degree generating function as Go{x) = ■ We also define the generating 

function for the excess degree, for a node reached by following a randomly chosen link, 
as Gi{x) = locally-tree like networks, the probability u that a node 

reached by following a randomly chosen link does not belong to the giant component is 
given by 

«= E + (“ -= Gi[i + («- m. (7) 

k=l ' ' 

The probability pk that a randomly chosen node with degree k belongs to the giant 
component can be obtained as 

Pk = l-[l + iu-l)f3f. (8) 

We can also obtain the size s of the giant component of a given network as 
s = 1 — Go[l -I- (u — l)/3]. Finally, the average epidemic size, Sk, initiated by node with 
a degree fc is a product of pk and s, Sk = spk- 
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Table 7. Multilinear regression for the QXF network. 

Mi = Co + cia“° + C2aJ“' + caa™ + c^af^ + c^ki. 


intercept 

k 

exchange 

balance 

coll. act. 

str. hole 


0.889”* 

0.0268*** 

-0.416*** 


2.89*** 

-3.79*** 

0.625 

[0.0432] 

[0.000205] 

[0.0690] 

[0.0688] 

[0.103] 

[0.211] 


0.808*** 

0.0270*** 

- 

1 Q4*** 

2 94*** 

-3.84*** 

0.624 

[0.0410] 

[0.000204] 


[0.0677] 

[0.102] 

[0.211] 


0.954*** 0.0276*** 

-0.209** 

- 

3.20*** 

—4.34*** 

0.617 


[0.0433] 

[0.000201] 

[0.0685] 


[0.102] 

[0.210] 


1 .86*** 

0.0265*** 

-0.579*** 

X 4g*** 

- 

-2.63*** 

0.602 

[0.0269] 

[0.000211] 

[0.0708] 

[0.0696] 


[0.213] 


0 .868*** 

0.0272*** 

-0.473*** 

1.32*** 

2.53*** 

- 

0.615 

[0.0436] 

[0.000207] 

[0.0675] 

[0.0687] 

[0.102] 



2 

- 

-1.39*** 

3.26*** 

2.06*** 

-6.12*** 

0.133 

[0.0631] 


[0.104] 

[0.102] 

[0.156] 

[0.319] 


2.15*** 

- 

- 

3.04*** 

2 .22*** 

-6.36*** 

0.121 

[0.0607] 



[0.101] 

[0.156] 

[0.321] 


2 

- 

-0.843*** 

- 

2.95*** 

8 Qy*** 

0.0650 

[0.0647] 


[0.107] 


[0.159] 

[0.326] 


3.07*** 

- 

-1.50*** 

3.50*** 

- 

-5.28*** 

0.121 

[0.0373] 


[0.105] 

[0.101] 


[0.314] 


2.39*** 

- 

-1.51*** 

3.63*** 

1.46*** 

- 

0.109 

[0.0640] 


[0.105] 

[0.101] 

[0.155] 
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Table 8. Multilinear regression for the QXG network. 

Mi= co+ cia“° + C2a\°-^ + 030“ + Ciaf^ + c^h. 


intercept 

k 

exchange 

balance 

coll. act. 

str. hole 


0.63*** 

0.0600*** 

0.517** 

1.25*** 

3.24*** 


0.684 

[0.0482] 

[0.000490] 

[0.0628] 

[0.0815] 

[0.127] 

[0.174] 


0 

0.0597*** 


1.35*** 

3.53*** 

-2.38*** 

0.682 

[0.0474] 

[0.000491] 


[0.0808] 

[ 0 . 122 ] 

[0.174] 


0.657*** 

0.0611*** 

0 . 668 *** 



-3.00*** 

0.675 

[0.0488] 

[0.000491] 

[0.0629] 


[0.125] 

[0.172] 


1.46*** 

0.0608*** 

0.960*** 

1 


-0.893*** 

0.660 

[0.0369] 

[0.000507] 

[0.0626] 

[0.0820] 


[0.170] 


0.681*** 

0.0608*** 

0.496*** 

1.50*** 

2.64*** 


0.677 

[0.0486] 

[0.000493] 

[0.0635] 

[0.0804] 

[ 0 . 121 ] 



2 . 10 *** 


- 0 . 102 *** 

2.72*** 

4.23*** 

—4.81*** 

0.141 

[0.0769] 


[0.103] 

[0.133] 

[0.209] 

[0.285] 


2.08*** 



2 YQ*** 


4 § 2 *** 

0.141 

[0.0756] 



[0.131] 

[ 0 . 201 ] 

[0.285] 


2 . 21 *** 


0.208*** 


5.31*** 

- 6 . 22 *** 

0.100 

[0.0785] 


[0.104] 


[0.207] 

[0.283] 


3.21*** 


0.467*** 

3.40*** 


- 2 . 86 *** 

0.101 

[0.0552] 


[ 0 . 102 ] 

[0.132] 


[0.275] 


2.23*** 


-0.161^*' 

3.26*** 

3.04*** 


0.113 

[0.0778] 


[0.105] 

[0.131] 

[ 0 . 200 ] 




Table 9. Multilinear regression for the POK network. 

Mi = Co + Cia“° + C2aj“' + C3a“ + C4af'* + coh. 


intercept 

k 

exchange 

balance 

coll. act. 

str. hole 


0.669*** 

0 . 0222 *** 

0.691*** 

2.29*** 

3.15*** 

1 34*** 

0.264 

[0.0414] 

[0.000362] 

[0.0571] 

[ 0 . 110 ] 

[0.0987] 

[0.126] 


0.859*** 

0.0223*** 


2 . 44 *** 

3.23*** 

1 3^*** 

0.257 

[0.0385] 

[0.000364] 


[ 0 . 110 ] 

[0.0989] 

[0.126] 


0.738*** 

0.0228*** 

0.817*** 


3.34*** 

1 37*** 

0.245 

[0.0418] 

[0.000365] 

[0.0575] 


[0.0995] 

[0.127] 


1 07*** 

0 . 0221 *** 

0.814*** 

2.62*** 


0.140*** 

0.219 

[0.0278] 

[0.000373] 

[0.0587] 

[0.113] 


[ 0 . 121 ] 


0.713*** 

0.0223*** 

0.675*** 

2.39*** 

2 77 *** 


0.259 

[0.0413] 

[0.000363] 

[0.0573] 

[ 0 . 110 ] 

[0.0920] 



0.921*** 


0.760*** 

2.85*** 

3.13*** 


0.0977 

[0.0456] 


[0.0632] 

[ 0 . 122 ] 

[0.109] 

[0.139] 


1 ^ 0 *** 



3.01*** 

3.21*** 

-1.63*** 

0.0898 

[0.0423] 



[ 0 . 121 ] 

[0.109] 

[0.140] 


1 . 02 *** 


0.920*** 


3.36*** 

-1.96*** 

0.0680 

[0.0461] 


[0.0639] 


[ 0 . 111 ] 

[0.141] 




0.882*** 



-0.19^*' 

0.0532 

[0.0303] 


[0.0646] 

[0.124] 


[0.133] 


0.978*** 


0.741*** 

2.98*** 

2.64*** 


0.0899 

[0.0455] 


[0.0635] 

[ 0 . 122 ] 

[ 0 . 102 ] 
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Table 10. Multilinear regression for the LJ network. 

Mi = Co + cia“° + C2aJ“' + caa™ + c^af^ + c^ki. 


intercept 

k 

exchange 

balance 

coll. act. 

str. hole 



0.00305*** 

1.30** 

4,47*** 

0.804*** 

1.46*** 

0.195 

[0.0207] 

[5.76e-5] 

[0.111] 

[0.0513] 

[0.0545] 

[0.0549] 


1.26*** 

0.00305*** 


4.56*** 

0.787*** 

4 4g*** 

0.194 

[0.0206] 

[5.76e-5] 


[0.0507] 

[0.0545] 

[0.0549] 


1 

0.00366*** 

2.88*** 


2.52*** 

0.638*** 

0.103 

[0.0219] 

[6.04e-5] 

[0.116] 


[0.0537] 

[0.0571] 


l,4g*** 

0.00301*** 

1.26*** 

4 


1 73*** 

0.193 

[0.0129] 

[5.76e-5] 

[0.111] 

[0.0479] 


[0.0512] 


1.23*** 

0.00304*** 

1.38*** 

4.23*** 

1.33*** 


0.186 

[0.0208] 

[5.79e-5] 

[0.112] 

[0.0508] 

[0.0510] 



1.32*** 


1.32*** 

4.80*** 

0.671*** 

1 43*** 

0.160 

[0.0211] 


[0.113] 

[0.0520] 

[0.0556] 

[0.0560] 


1.35*** 



4.89*** 

0.6534* * * 

1.45*** 

0.159 

[0.0210] 



[0.0514] 

[0.0556] 

[0.0561] 


1 27*** 


3.03*** 


2.51*** 

0.531*** 

0.0505 

[0.0224] 


[0.119] 


[0.0551] 

[0.0587] 


1.52*** 


1.28** 

5.02*** 


1.68*** 

0.158 

[0.0132] 


[0.113] 

[0.0486] 


[0.0523] 


1 


1.39** 

4.56*** 

1 29*** 


0.152 

[0.0212] 


[0.114] 

[0.0515] 

[0.0520] 




Table 11. Multilinear regression for the PRO network. 

Mi = Co + cia“° + C2aJ“' + caa™ + Ciaf^ + c^h. 


intercept 

k 

coll. act. 

str. hole 


-0.405*** 

0.137*** 

3.83*** 

3.67*** 

0.625 

[0.0382] 

[0.00128] 

[0.0810] 

[0.134] 


-0.412*** 

0.139*** 


4.53*** 

0.594 

[0.0398] 

[0.00133] 


[0.0800] 



0.131*** 

5.65*** 


0.533 

[0.0246] 

[0.00142] 

[0.142] 



0.891*** 


3.10*** 

4.50*** 

0.154 

[0.0545] 


[0.121] 

[0.201] 


2.05*** 



6.07*** 

0.093 

[0.0309] 



[0.198] 


0.906*** 


3.93*** 


0.108 

[0.0559] 


[0.118] 
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